Skip to content
/ server Public

MDEV-38877: Unnecessary filesort on derived table materialization#4722

Open
OmarGamal10 wants to merge 1 commit intoMariaDB:12.3from
OmarGamal10:mdev-38877
Open

MDEV-38877: Unnecessary filesort on derived table materialization#4722
OmarGamal10 wants to merge 1 commit intoMariaDB:12.3from
OmarGamal10:mdev-38877

Conversation

@OmarGamal10
Copy link

@OmarGamal10 OmarGamal10 commented Mar 2, 2026

Why it happens?

The optimizer flags outer references for subqueries as constants, so that for every re-execution for the subquery, index is skipped, as filtering on a constant makes all rows have the same value already.

  • Consider this example
    SELECT * FROM t2 JOIN (SELECT groups_20, MAX(b) FROM t1 GROUP BY groups_20) DT ON t2.a = groups_20;

  • After hours of investigation, I found that the index is bypassed because table->const_key_parts incorrectly flags the [GROUP/ORDER] BY column as a constant. This optimization is correct for Nested Loop / Lateral joins since the subquery is re-executed for each outer row, the join column is a literal constant in this context, making index usage/sorting redundant.
    However, if the optimizer decides to materialize the subquery, the subquery is executed once to build a table. In this context, the column is a variable, not a constant.

    • The constant flag tricks the optimizer into assuming the data is already sorted.
    • The optimizer skips the index.
    • The optimizer finds out that data needs grouping/sorting and is not constant, falling back to a full table scan and a filesort.
  • The fix is a guard condition to prevent treating an outer reference as a constant in case of derived tables.

  • Before

Before
  • After
After

Fixes unnecessary filesort on derived tables when ordered/grouped by a
field in the key. The data is inherently sorted, wrapping the result set
in a filesort is redundant.
@grooverdan grooverdan added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 2, 2026
@grooverdan
Copy link
Member

from discussion:
https://mariadb.zulipchat.com/#narrow/channel/118759-general/topic/MDEV-38877/with/576866739

From @OmarGamal10 , that @mariadb-RexJohnston I can get you to provide guidance on.

"I've verified that the skipped filesorts in the fails are correct, the cases are similar to what I worked on and I verified correctnes[s] by modifying the test locally to check the returned set .

The estimation on the other hand of the filters was for some reason correct before my fix, I checked the real percentage of the filtered columns and the old estimate was right, also for this query the plan showed that the materialized table is Lateral derived not just derived, so I figured maybe something related to it being lateral and my fix (not flagging the constant) messed the estimate up. So it's been about 2.5 hours trying to find in code how to know if a table is derived and lateral, to add it to the condition, there are variables for finding it yes, but for some reason none of them gets set at all, I think because syntactically the query has no "Lateral" word, but it's just a subquery, the optimizer probably rewrote it to be lateral, which I can't seem to catch at all
"

@mariadb-RexJohnston
Copy link
Member

Hi,

you are right to point out the involvement of the split_materialized optimizer flag.
By default it is on.

MariaDB [test]> SET optimizer_switch='split_materialized=on';
Query OK, 0 rows affected (0.002 sec)

MariaDB [test]> explain select a, sum(b) from    (     select groups_20 from t1      group by groups_20       having count(*)  != 1000   ) DT    join t2 on a = groups_20 group by a;
+------+-------------+------------+------+---------------+------+---------+-----------+-------+----------------------------------------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref       | rows  | Extra                                        |
+------+-------------+------------+------+---------------+------+---------+-----------+-------+----------------------------------------------+
|    1 | PRIMARY     | t2         | ALL  | a             | NULL | NULL    | NULL      | 1001  | Using where; Using temporary; Using filesort |
|    1 | PRIMARY     | <derived2> | ref  | key0          | key0 | 4       | test.t2.a | 1     |                                              |
|    2 | DERIVED     | t1         | ALL  | PRIMARY       | NULL | NULL    | NULL      | 19735 | Using temporary; Using filesort              |
+------+-------------+------------+------+---------------+------+---------+-----------+-------+----------------------------------------------+
3 rows in set (0.007 sec)

if we switch it off, our problem vanishes.

MariaDB [test]> SET optimizer_switch='split_materialized=off';
Query OK, 0 rows affected (0.002 sec)

MariaDB [test]> explain select a, sum(b) from    (     select groups_20 from t1      group by groups_20       having count(*)  != 1000   ) DT    join t2 on a = groups_20 group by a;
+------+-------------+------------+-------+---------------+---------+---------+--------------+-------+---------------------------------+
| id   | select_type | table      | type  | possible_keys | key     | key_len | ref          | rows  | Extra                           |
+------+-------------+------------+-------+---------------+---------+---------+--------------+-------+---------------------------------+
|    1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL         | 20    | Using temporary; Using filesort |
|    1 | PRIMARY     | t2         | ref   | a             | a       | 5       | DT.groups_20 | 1     |                                 |
|    2 | DERIVED     | t1         | index | NULL          | PRIMARY | 8       | NULL         | 19735 |                                 |
+------+-------------+------------+-------+---------------+---------+---------+--------------+-------+---------------------------------+
3 rows in set (0.006 sec)

indeed, when we set the problematic table->const_key_parts, we see in our stack trace, we are calling sort_and_filter_keyuse() from JOIN::add_keyuses_for_splitting(), but split materialization isn't used.

test_if_order_by_key() then later uses this field to determine the requirement for a temporary table.

It would be reasonable to conclude that const_key_parts is set above for accessing t1 from t2 using split materialization, and that since we did not decide to use this access method, resetting this field back to it's previous value before calling JOIN::add_keyuses_for_splitting(), but after deciding NOT to use split materialization, perhaps in JOIN_TAB::fix_splitting(), would be the safest fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

3 participants